Predicting the total PAHs concentrations in sediments from selected congeners using a multiple linear relationship

In this study, we observed that four congeners, including naphthalene (Nap), acenaphthylene (Acy), phenanthrene (Phe), and benz(a)anthracene (BaA), are the characteristic congeners for predicting the emission and the sediment concentrations of polycyclic aromatic hydrocarbons (PAHs). A novel multiple relationship of the total PAHs concentrations (C∑PAHs) in sediments with the concentrations of four congeners was established (p < 0.01, R2 = 0.95) using published data over the past 30 years. Moreover, the multiple linear relationship of the total PAHs emission factors with the emission factors of four congeners was also established (p < 0.01, R2 = 0.99). Interestingly, the ratio of multicomponents coefficient from the multiple linear relationship in sediments to that from the multiple linear relationship in emission sources correlated positively with octanol–water partition coefficient (logKow) (p < 0.01, R2 = 0.88) of the four PAHs congeners. Therefore, a novel model was established to predict CΣPAHs in sediments using the emissions and logKow of the four characteristic PAHs congeners. The percent sample deviation between calculated C∑PAHs and their observed values was 54%, suggesting the established model can accurately predict CΣPAHs in sediments.

Polycyclic aromatic hydrocarbons (PAHs) are a group of persistent organic contaminants that originate from the incomplete combustion of organic matter (such as biomass and coal) and non-combustion emissions of petrogenic processes [1][2][3] . United States Environmental Protection Agency (USEPA) and European Union (EU) have identified sixteen PAHs (Table S1) as priority pollutants due to their toxicity and risks to human health and the environment [4][5][6][7] . For example, PAHs in sediments can pose detrimental effects on benthic organisms and pelagic organisms 8,9 . Therefore, PAHs concentrations in sediments have been widely investigated in the past decades for assessing their risks [10][11][12][13][14][15][16][17][18] . The common method for the quantification of PAHs in sediments is chromatography, including gas chromatography or liquid chromatography, coupled with mass spectrometry [19][20][21] . However, the whole process of chromatographic analysis for PAHs in sediments is tedious and cost-consuming. Moreover, this method may be potentially damaging for the environment because the analysis typically requires a preconcentration procedure, which may use large volumes of organic solvents for extraction and clean-up [22][23][24] . For example, the frequently used organic solvents, dichloromethane [19][20][21] , can damage human nervous system and even the functions of liver and kidney through skin mucosa and nasal breathing 25,26 . Therefore, it is necessary to establish correlations that can be applied to predict concentrations of PAHs in sediments for cutting costs and saving time in laboratory analysis 27,28 .
In recent literatures, significantly positive correlations between the total concentration of PAHs (C ƩPAHs ) and the content of total organic carbon (f oc ) in sediments were established to predict C ƩPAHs Scientific Reports | (2022) 12:3334 | https://doi.org/10.1038/s41598-022-07312-2 www.nature.com/scientificreports/ possible reason for the insignificant correlation between C ƩPAHs and f oc when more data was introduced is that the difference in emissions of PAHs in various regions is ignored. Another possible reason is that the dependence of PAHs partitioning in sediment organic matter and their polarity is ignored 4,32,33 . Furthermore, nonlinear sorption of PAHs on sediments organic matter is also ignored 34,35 . Intrinsic quantitative relationships between C ƩPAHs and the concentrations of single PAHs congener were also established to predict sediment C ƩPAHs in previous studies [36][37][38][39] . For example, the concentration of benzo(a) pyrene (C BaP ) 36,37 , pyrene (C Pyr ) 38 or acenaphthene (C Ace ) 39 was suggested to predict C ƩPAHs (Table S3). However, when relationships of C ƩPAHs with C BaP (Eq. 2), C Pyr (Eq. 3) and C Ace (Eq. 4) were established using the additional sediment concentration data of PAHs from China (Table S4), it was found that the relationships were less significant with greater deviation (Eqs. [2][3][4]. For example, R 2 of the linear relationship between C ƩPAHs and C Ace (Eq. 4) reduced from 0.82 39 to 0.49 (Eq. 4) with SDEV increased from 27% 39 (Table S3) to 461% (Eq. 4), when sediment sample numbers (N) increased from 10 39 (Table S3) to 754 (Eq. 4), respectively. A possible reason for these relationships (Eqs. 2-4) predicted with less accuracy is that the difference in emission factors (EFs) of PAHs congeners for various sources is ignored. For example, at the equivalent emission factors of the total PAHs (EF ƩPAHs ), which is 4.39 g t −1 for iron sintering and 4.51 g t −1 for gasoline combustion (Table 1), EF of Ace (Table 1) from iron sintering is 0.079 g t −1 , about 2 orders of magnitude larger than that 0.00046 g t −1 of gasoline combustion 40,41 . Therefore, the concentrations of single PAHs congener cannot be used to accurately predict C ƩPAHs in sediments on a large scale. The characteristic PAHs congeners in emission sources and in sediments should be explored to develop an accurate model for predicting C ƩPAHs in sediments.
(1) www.nature.com/scientificreports/ In this study, a multiple linear relationship between EF ΣPAHs and the EFs of characteristic congeners was established by identifying characteristic PAHs congeners in emission sources. Moreover, another multiple linear relationship between C ΣPAHs and the concentrations of characteristic congeners in sediments was established by identifying characteristic PAHs congeners in sediments. Finally, an accurate model for predicting C ΣPAHs in sediments was established by exploring the correlation between the sediment concentrations and the emissions of characteristic PAHs congeners. The established model can cut costs and save time in PAHs analysis for risk assessing of PAHs in the sediment environment.

Result and discussion
Characteristic congeners of PAHs in emission. Hierarchical clustering analysis (HCA) and classifications for relative similarities 42,43 of PAHs emission factors (Table S5) show that sixteen PAHs can be divided into four groups (Table 1, Fig. 2). The first subgroup is Nap (Table 1, Fig. 2) because of its highest EFs in most emission sources (Table S5). Acy is the second subgroup (Table 1, Fig. 2) with significant lower EFs than that of Nap but higher than other PAHs congeners in most emission sources (Table S5). The third subgroup comprises four 3-ring PAHs (Ace, Flo, Phe and Ant) and two 4-ring PAHs (Pyr and Flu) (Fig. 2, Table 1). In this subgroup, the EFs of Phe (EF Phe ) has the best linear correlation with the total EFs of this subgroup, showing the maximum R 2 of 0.98 (N = 15, p < 0.01) ( Table 1). Thus, the total EFs of the third subgroup can be expressed by EF Phe with the largest degree of accuracy 43,44 . The last subgroup is composed of the other eight PAHs congeners (Fig. 2, Table 1), including 4-ring PAHs (BaA and Chr) and 5, 6-ring PAHs (BbF, BkF, BaP, IcdP, DahA and BghiP). The EFs of BaA (EF BaA ) correlates best with the total EFs of the eight PAHs in this subgroup with the maximum R 2 of 0.98 (N = 15, p < 0.01) ( Table 1). This indicates that the total EFs of the last subgroup can be presented by EF BaA . Moreover, the total EFs of sixteen PAHs (EF ƩPAHs ) are well related with EF Nap , EF Acy , EF Phe , and EF BaA in a multilinear relationship (Eq. 5 and Fig. 3), having R 2 of 0.99, high F values of 5257, and low SDEV of 24%. Therefore, Nap, Acy, Phe, and BaA can be employed as characteristic congeners of sixteen PAHs in emission sources (Fig. 3).     Fig. 4. When PAHs are classified into two groups, the first subgroup is composed of 2, 3-rings PAHs and two 4-rings PAHs ( Table 2). In this subgroup, the concentration of Phe (C Phe ) correlates best with the total concentration of the eight congeners with the maximum R 2 of 0.83 (N = 754, p < 0.01) ( Table 2). The other two 4-rings PAHs and 5, 6-rings PAHs are divided into the second subgroup ( Table 2). In the second subgroup, the maximum value of R 2 was found between the concentration of BaA (C BaA ) and the total concentration of the eight congeners (R 2 = 0.79, N = 754, p < 0.01) ( Table 2). Thus, the total concentration of PAHs in two subgroups can be expressed by C Phe and C BaA with the largest degree of accuracy,  www.nature.com/scientificreports/ respectively 43,44 . Relationship of C Phe and C BaA with C ƩPAHs was established in Eq. (6). Similarly, when 16 PAHs were classified into three groups, the concentration of Nap (C Nap ), Phe (C Phe ), and BaA (C BaA ) correlate best with the total concentrations of congeners in corresponding subgroup ( Table 2). Relationship of C Nap , C Phe , and C BaA with C ƩPAHs was established in Eq. (7). When sixteen PAHs were classified into four groups, C Nap , C Acy , C Phe , and C BaA correlate best with the total concentrations of congeners in corresponding subgroup ( Table 2). Relationship of C Nap , C Acy , C Phe , and C BaA with C ƩPAHs was established in Eq. (8). When sixteen PAHs were classified into five groups, C Nap , C Acy , C Phe , C BaA , and the concentration of DahA (C DahA ) correlate best with the total concentrations of congeners in corresponding subgroup ( Table 2). Relationship of C Nap , C Acy , C Phe , C BaA and C DahA with C ƩPAHs was established in Eq. (9). Correlations between the calculated C ƩPAHs (C ƩPAHs(cal) ) and the experimental value (C ƩPAHs(exp) ) are presented in Fig. 5. SDEV values between C ƩPAHs(cal) and C ƩPAHs(exp) in Fig. 5a-d were 124%, 75%, 35% and 37%, respectively. SDEV values decreased significantly from Fig. 5a-c, while almost remained constant from Fig. 5c, d. Intercepts in equations presented the same tendency to SDEV value (Fig. 5). C ƩPAHs can't be accurately predicted using two (Fig. 5a) or three congeners (Fig. 5b), especially when C ƩPAHs in sediments lower than the intercepts in Eqs. (6)(7). Four (Fig. 5c) or five congeners (Fig. 5d) can accurately predict C ƩPAHs . However, more work needs to be done to complete the prediction of five congeners than four congeners. In summary, C ƩPAHs can be well predicted from the concentration of Nap, Acy, Phe, and BaA using the linear relationship of Eq. (8) (Fig. 5c).     www.nature.com/scientificreports/ (C ∑PAHs ) in sediments can be predicted from the concentrations of four characteristic congeners with high R 2 but low SDEV value. Figure 5c shows that C ∑PAHs(cal) are consistent with C ∑PAHs(exp) in sediments sampled in China. Scatterplots of unstandardized residuals (the difference between C ƩPAHs(cal) and C ƩPAHs(exp) ) with C Nap , C Acy , C Phe or C BaA in sediments distributed regularly on both sides of the horizontal line and no obvious positive or negative trend existed (Fig. S1). This indicates the significant stability of the established linear relationship 45 . Furthermore, C ƩPAHs(cal) also agreed well with C ƩPAHs(exp) using additional concentration data (Table S6) in sediment samples from elsewhere around the globe (N = 691) (Fig. 6), suggesting that the relationship can be applied to predict C ƩPAHs in sediments that are not only localized in China. Therefore, Nap, Acy, Phe, and BaA can be also employed as characteristic congeners of sixteen PAHs in sediments. The same characteristic congeners observed for PAHs in sediments and in emission sources indicates that the concentration of PAHs in sediments are largely depended on their emission, which was consistent with the reported results [46][47][48] . In previous studies 17,29-31 , PAHs emissions were not involved in the relationship of predicting PAHs concentration in sediments using f oc (Eq. 1), which does not hold true in some cases. For example, for a given f oc , C ƩPAHs in sediments can vary by 1-3 orders of magnitude (Fig. 1a) because of the difference in PAHs emissions in various regions. For another example, although the mean f oc (6.4%) in sediments from the upper reach of Huaihe River 50 was higher than that in sediments from the lower reach of Huaihe River (f oc = 4.1%) 51 , C ΣPAHs in sediments from the lower reach of Huaihe River (mean value = 1721.7 µg kg −1 ) 50 were higher than that from the upper reach (mean value = 400.5 µg kg −1 ) 51 . This can be attributed to the higher emission intensity of PAHs in lower reach of Huaihe River region than that in the upper reach region 41,52,53 . Moreover, a significantly positive correlation between mean concentrations of thirteen PAHs (except for Nap, Acy, and Ace) in sediment samples derived from globe (N = 1445) and their mean EFs in fifteen emission divisions was also observed (Fig. S2), indicating that the concentration of PAHs in sediments also mainly depends on the PAHs emission. The deviation of Nap, Acy, and Ace from the linear relationship in Fig. S2 can be attributed to their relatively low logK ow but high S w (Table S1), making them not be readily adsorbed by organic matters in sediments but tend to be more readily dissolved in water 48 .

Relationship between PAHs concentrations in sediments and EFs in emission sources.
Significance of multicomponent coefficients in Eqs. (5) and (8) were less than 0.05 (p < 0.05), which are statistically significant. However, significance of intercept in Eq. (5) was greater than 0.05 (p > 0.05), which is statistically insignificant. The significant intercept (p < 0.05) in Eq. (8) can be assigned to background concentrations of PAHs in sediments 54,55 . Moreover, significantly positively linear relationships of the multicomponent coefficients in Eq. (5) and that in Eq. (8) with the logK ow of four characteristic congeners were observed (Fig. 7a). Interestingly, the coefficient of characteristic congeners with larger logK ow , such as BaA, in Eq. (8) are higher than that in Eq. (5) (Fig. 7a). This could be attributed to the influence of sorption of PAHs in sediments and their biodegradation in the environment. For PAHs with larger logK ow , they tend to be more readily adsorbed in sediments organic matter by partitioning than the PAHs with smaller logK ow 56,57 . Meanwhile, PAHs congeners with relatively low logK ow tend to be more readily degraded than those with relatively high logK ow 58,59 , presented by the positively linear relationship of PAHs logK ow with their biodegradation half-life (Fig. S3). Therefore, a positively linear relationship between the ratio of multicomponents coefficient from the multiple linear relationship in sediments (Eq. 8) to that from the multiple linear relationship in emission sources (Eq. 5) and the logK ow of four PAHs congeners can be observed in Fig. 7b. This suggests that the distribution of PAHs in sediments could www.nature.com/scientificreports/ also be dependent on their environmental behaviors including sorption and biodegradation in addition to their emissions. In previous study 46 , significant linear relationships between the concentrations of sixteen PAHs in sediments (C PAHs ) with their emissions (E PAHs ) were established (Eq. 10). Moreover, positive and negative relationships of K (Eq. 11) and L (Eq. 12) with logK ow were established, respectively.
In this study, a multilinear relationship of C ΣPAHs with C Nap , C Acy , C Phe , and C BaA was established (Eq. 8). The concentration of four characteristic PAHs congeners in sediments can be calculated using their emissions and logK ow (Eqs. [10][11][12]. Therefore, C ΣPAHs in sediments can be predicted using the emissions and logK ow of four characteristic PAHs congeners, in which logK ow can be accounted for PAHs partition ability. Mean C ∑PAHs(exp) in surface sediments sampled in investigated provinces of China (N = 30) and other countries (N = 21) versus C ∑PAHs(cal) predicted using E PAHs (Tables S7 and S8) and logK ow of four characteristic PAHs congeners is presented in Fig. 8. The SDEV value between C ∑PAHs(exp) and C ∑PAHs(cal) is 54%, suggesting that C ∑PAHs(cal) are well consistent with C ∑PAHs(exp) . Therefore, the established model in this study can be used to predict C ΣPAHs in sediments with high accuracy, resulting in decreasing cost of laboratory analysis.
The correlations that have been previously described in literature of C ƩPAHs with C BaP 36,37 , C Pyr 38 or C Ace 39 could be attributed to the PAHs emissions in the investigated region from one emission source or emission sources with similar EFs (Table 1). For example, PAHs in sediment samples of Norway were mainly from manufactured gas plants and aluminum smelters 38 , in which Pyr is the dominant congener with relatively high EFs 40,41,52 . However, the multiple linear relationship established herein (Eq. 6) gives a useful way to predict the C ƩPAHs in sediments using PAHs emissions emitted from major emission sources around the world as seen by the good correlation with the large and diverse sample size. Therefore, this relationship would be valuable for predicting total PAHs concentrations and assessing their risks in sediments.

Conclusion and perspectives
A multiple linear relationship of C ∑PAHs with C Nap , C Acy , C Phe , and C BaA in sediments was established employing the reported data in the past 30 years. This suggested the selected four PAHs congeners, including Nap, Acy, Phe, and BaA, are the characteristic congeners in sediments. Moreover, the multiple linear relationship of EF ∑PAHs with the EFs of the four congeners was also developed. The same characteristic congeners observed for PAHs in sediments and in emission sources indicates that the concentration of PAHs in sediments are largely dependent on their emissions. Additionally, the ratio of multicomponents coefficient from the multiple linear relationship in sediments to that from the multiple linear relationship in emission sources correlated positively with logK ow of the four congeners. Therefore, a model for predicting C ΣPAHs in sediments was established using the emissions www.nature.com/scientificreports/ and logK ow of four PAHs congeners. The SDEV value between C ∑PAHs(exp) and C ∑PAHs(cal) was 54%, suggesting the established model can accurately predict C ΣPAHs in sediments.
Although the relationship established in this study could be used to predict total sixteen PAHs concentration in surface sediments of China and other countries, the application of this method for predicting additional PAHs in sediment, such as alkylated-PAHs, needs further verified.

Methodology
Literature search. Concentration data of sixteen parent PAHs (Table S4 and Table S6) in global bottom sediments of fresh water reported in the past 30 years were collected. A systematic literature retrieval was performed using the ISI Web of Science database, Google®Scholar, WanFang Data of E-Resources and China Knowledge Resource Integrated Database including master/doctoral dissertation using the terms of "polycyclic aromatic hydrocarbons" or "PAHs" and "sediment/sediments" as the primary keywords 60 . Articles were then examined individually to ensure that the duplicates and irrelevant articles were excluded from further analysis. In addition, articles without individual PAHs concentration data and/or articles that did not report QA/QC procedure and limits of detection (LODs) were also excluded from further analysis 60 . In order to perform the required comparisons in this study, it was assumed that there were no significant differences in the sampling process and analysis among the investigating groups/laboratories 60,61 . In total, 22,349 individual PAH concentrations from 1445 sediments samples were collected from 1184 publications and then used for meta-analysis (Table S4 and Table S6).

PAHs emissions.
According to the previous study 40 , PAHs emissions in globe was primarily emitted from coking production, petroleum refineries, domestic and industrial coal combustion, straw and firewood burning, iron-steel industry, transport petroleum, and primary Al production, which accounting for more than 92% of all source contributions. Therefore, PAHs emissions from these nine sources were calculated using a previously reported approach for analysis 40,41 . Emission factors (EFs, g t −1 ) of sixteen PAHs from above nine emission sources were summarized in Table S5. Provincial and national PAHs emissions (E PAHs , t a −1 ) were calculated using Eq. (13) 40,41,52 : where i, j, and k represent each PAH congener, sources, and technology, respectively. EF (g t −1 ) is the emission factor of the PAHs congener i (Table S5). X is the fraction of the activity rate contributed by a given technology j, which was calculated using the technology split method 40,41,52 . Activity data For the technology splitting approach, six sources (coking production, industrial coal combustion, indoor straw and firewood burning, iron-steel industry, and primary Al production) were divided into two or three  www.nature.com/scientificreports/ divisions with or without different emission mitigation measures 40,41 . For the remaining three sources, fixed EF PAHs without divisions were used 40,41 . The time-dependent fractions of technology divisions were calculated using a series of S-shaped curves (Eq. 14, Table S9).
where X 0 and X f are initial and final fractions of a certain technology division, respectively. t 0 is the start time of technology transition, and s is a rate. The X f , X 0 , t 0 , and s were illustrated in Table S9 40 .
Data analysis. 11,937 individual concentrations data from 754 sediments sampled from China were used to establish the relationship between C ƩPAHs and the concentration of selected congeners in sediments (Table S4).
To validate the established relationship, 10,412 individual concentrations data from 691 sediments sampled from globe (excluding China) were used as a test set (Table S6). Concentrations that were reported to be below the method LODs were assigned half the value of the reported LODs 61,62 . Concentration unit of PAHs were set uniformly to micrograms per kilogram (µg kg −1 ) of dry weight. Prior to statistical analysis, a histogram with normal curve was viewed and a Kolmogorov-Smirnov test were performed to verify the normality of variables 38 . If the significance (p) is greater than 0.05 (p > 0.05), it can be judged that the variables are normal 38 . Hierarchical clustering analysis (HCA) and classifications were performed using the SPSS Statistics 19.0 software (Version 19.0, Chicago, IL, USA) according to relative similarities of EFs in emission sources and concentrations in sediments of sixteen PAHs 42,43 . Statistical analysis, including linear and multilinear regression, were also performed using the SPSS Statistics 19.0 with a critical significance (p) up to 0.05 to check significance. Provincial or national PAHs emissions combined with their observed C ∑PAHs in surface sediments were used to evaluate the established model between C ∑PAHs and emissions of the four characteristic congeners. Moreover, C ∑PAHs in surface sediments sampled in the same province or nation at the same year were expressed by the geometric mean value 63 . Percent sample deviation (SDEV, Eq. (15)) was calculated based on the relative error between the experimental values (C exp ) and the calculated value (C cal ) 64 . In addition to the SDEV, significance of F test (p) and correlation coefficient (R 2 ) were used to evaluate the goodness of the fitting and the established correlations by regression analysis 64 .
where N is the number of experimental values and k is the number of predictors for linear regression.